zero-shot semantic segmentation
Open Vocabulary 3D Occupancy Prediction from Images Supplementary Material
In this supplementary material, we first give additional details about the method in Sec. 1. Queries used for zero-shot semantic segmentation. We do this for all the annotated classes in the dataset (second column). One can see that, for example, class name'manmade' lacks descriptive specificity. In the text description of this class, we can find "... buildings, walls, guard rails, fences, poles, street signs, traffic lights ..." and more. Table 1: Queries used for zero-shot semantic segmentation.
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (0.95)
What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation
While semantic segmentation has seen tremendous improvements in the past, there are still significant labeling efforts necessary and the problem of limited generalization to classes that have not been present during training. To address this problem, zero-shot semantic segmentation makes use of large self-supervised vision-language models, allowing zero-shot transfer to unseen classes. In this work, we build a benchmark for Multi-domain Evaluation of Zero-Shot Semantic Segmentation (MESS), which allows a holistic analysis of performance across a wide range of domain-specific datasets such as medicine, engineering, earth monitoring, biology, and agriculture. To do this, we reviewed 120 datasets, developed a taxonomy, and classified the datasets according to the developed taxonomy. We select a representative subset consisting of 22 datasets and propose it as the MESS benchmark. We evaluate eight recently published models on the proposed MESS benchmark and analyze characteristics for the performance of zero-shot transfer models.
Zero-Shot Semantic Segmentation
Semantic segmentation models are limited in their ability to scale to large numbers of object classes. In this paper, we introduce the new task of zero-shot semantic segmentation: learning pixel-wise classifiers for never-seen object categories with zero training examples. To this end, we present a novel architecture, ZS3Net, combining a deep visual segmentation model with an approach to generate visual representations from semantic word embeddings. By this way, ZS3Net addresses pixel classification tasks where both seen and unseen categories are faced at test time (so called generalized zero-shot classification). Performance is further improved by a self-training step that relies on automatic pseudo-labeling of pixels from unseen classes. On the two standard segmentation datasets, Pascal-VOC and Pascal-Context, we propose zero-shot benchmarks and set competitive baselines. For complex scenes as ones in the Pascal-Context dataset, we extend our approach by using a graph-context encoding to fully leverage spatial context priors coming from class-wise segmentation maps.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.61)
Uncertainty-Aware Learning for Zero-Shot Semantic Segmentation
Zero-shot semantic segmentation (ZSS) aims to classify pixels of novel classes without training examples available. Recently, most ZSS methods focus on learning the visual-semantic correspondence to transfer knowledge from seen classes to unseen classes at the pixel level. Yet, few works study the adverse effects caused by the noisy and outlying training samples in the seen classes. In this paper, we identify this challenge and address it with a novel framework that learns to discriminate noisy samples based on Bayesian uncertainty estimation.
Semantic4Safety: Causal Insights from Zero-shot Street View Imagery Segmentation for Urban Road Safety
Chen, Huan, Han, Ting, Chen, Siyu, Guo, Zhihao, Chen, Yiping, Wu, Meiliu
Street-view imagery (SVI) offers a fine-grained lens on traffic risk, yet two fundamental challenges persist: (1) how to construct street-level indicators that capture accident-related features, and (2) how to quantify their causal impacts across different accident types. To address these challenges, we propose Semantic4Safety, a framework that applies zero-shot semantic segmentation to SVIs to derive 11 interpretable streetscape indicators, and integrates road type as contextual information to analyze approximately 30,000 accident records in Austin. Specifically, we train an eXtreme Gradient Boosting (XGBoost) multi-class classifier and use Shapley Additive Explanations (SHAP) to interpret both global and local feature contributions, and then apply Generalized Propensity Score (GPS) weighting and Average Treatment Effect (ATE) estimation to control confounding and quantify causal effects. Results uncover heterogeneous, accident-type-specific causal patterns: features capturing scene complexity, exposure, and roadway geometry dominate predictive power; larger drivable area and emergency space reduce risk, whereas excessive visual openness can increase it. By bridging predictive modeling with causal inference, Semantic4Safety supports targeted interventions and high-risk corridor diagnosis, offering a scalable, data-informed tool for urban road safety planning.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.16)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
Open Vocabulary 3D Occupancy Prediction from Images Supplementary Material
In this supplementary material, we first give additional details about the method in Sec. 1. Queries used for zero-shot semantic segmentation. We do this for all the annotated classes in the dataset (second column). One can see that, for example, class name'manmade' lacks descriptive specificity. In the text description of this class, we can find "... buildings, walls, guard rails, fences, poles, street signs, traffic lights ..." and more. Table 1: Queries used for zero-shot semantic segmentation.
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (0.95)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)